Conference Talk · 202601 / 15

The Anatomy
of an AI Agent

with a focus on document agents

C
Clelia Astra Bertelli  ·  LlamaIndex  ;
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
Follow Along01b / 15

Open on your device

QR code to open presentation

Scan to follow along on your phone or tablet

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
Introduction02 / 15

Agents are everywhere

  • 2025 saw AI agents move from research curiosity to everyday tools
  • Manus, Claude Code, Gemini, ChatGPT operator mode: each changed how we work
  • 2026: agentic workflows are becoming the default, not the exception
  • They browse the web, write code, process documents, manage tasks autonomously

"From Manus to Claude Code, AI agents are making their way into the everyday life of everyone of us."

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
Introduction03 / 15

What makes a good AI agent?

In this talk, we'll answer that question by going through the anatomy of one. We'll explore:

  • How it thinks & acts — structured reasoning and tool use
  • How it interacts with the world — filesystem, tools, chat interfaces
  • How to control its flow — event-driven loops and step execution
  • How to give it the right context — avoiding hallucinations, staying safe
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
The Brain04 / 15

The Brain: LLM as Foundation

  • The LLM is the main actor: it produces thoughts, observations and actions
  • Given the previous chat history, it reasons about what to do next
  • Problem: LLMs are non-deterministic by nature
  • Solution: structured outputs: JSON schemas that steer the model precisely
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
The Brain05 / 15

Steering the LLM:
Structured Outputs

Every LLM call in the agent uses structured generation: the model must respond in a predefined JSON schema, making outputs machine-readable and reliable.

  • No free-form prose: every response has a typed, schema-validated structure
  • Schemas are purpose-built per operation: think, act, observe, stop
  • Forces clear separation of reasoning and action
  • The agent wrapper exposes only structured generation methods
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
The Brain06 / 15

The Four Operations

Think Reason about the next action based on history and current state
Act Call a tool (based on the thought trace produced in the previous step)
Observe Evaluate tool results and update the agent's understanding
Stop Recognize task completion and exit the loop gracefully

These four operations, driven by structured outputs, are the base for the agent loop.

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
The Loop07 / 15

The Loop: Task Executor

The agent loop is a LlamaIndex Agent Workflow — an event-driven, stepwise execution engine that connects the LLM to the external world.

  • A user prompt arrives as an input event, signalling the start of the workflow
  • Each step dispatches a typed event to the next: think → act or stop → observe → think
  • Tool results feed back as observations, restarting the think step
  • The loop runs until the LLM produces a Stop (after at least one full cycle)
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
The Loop08 / 15

Event-Driven Execution

InputUser prompt received
ThinkReason about next action
ActCall a tool
ObserveProcess tool result
StopTask complete

After each Observe, the loop restarts from Think — until the LLM decides all tasks and sub-tasks are done and produces a Stop event.

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
World09 / 15

Windows on the World

The LLM loop is a generalist architecture. What defines a document processing agent are the interfaces through which it interacts with the external environment.

Filesystem The source of documents — files are read, written and managed here
Tools Document parsing, extraction and classification via LlamaParse
Chat Interface In our case, a Telegram bot for async communication and document uploads
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
World · Filesystem10 / 15

The Eyes: Virtual Filesystem

  • Agent uses AgentFS: a virtualized filesystem, not the real machine one
  • Full FS access for autonomous agents is dangerous, especially on shared machines
  • Exposed operations: read, write, edit only, no delete or shell execution
  • Scope is limited to the working directory and its children

Even if the agent is tricked into writing malicious files, the real machine filesystem remains completely unaffected as the virtual FS absorbs the damage, which does not permeate the real machine unless you sync it with the virtual one.

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
World · Tools11 / 15

The Limbs: Document Tools

Beyond plain-text file ops, the document agent uses LlamaParse tools that genuinely understand unstructured content (PDFs, Word docs, PowerPoint, Excel and more).

Parse Full text parsing using OCR, VLMs and agentic approaches for any file format
Extract Structured data extraction following a custom JSON schema you define
Classify Document classification into user-defined categories at scale
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
World · Chat Interface12 / 15

The Ears & Mouth: Chat Interface

  • Users send .txt, .pdf, .docx documents via Telegram — the agent downloads them to AgentFS
  • Text messages trigger agent tasks with full autonomous execution
  • Key design choice: asynchronous (no loading spinners, push notifications when done)
  • Perfectly matches LlamaIndex Workflows, which are async-first by design

Document workflows can take minutes to half an hour. The agent works in the background and pings you when finished: just like a colleague, not a spinner :)

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
Recap · Safety13 / 15

A Note on Safety

  • Virtual filesystem — no access to the real machine FS
  • No shell execution — agent cannot run bash commands
  • Read / write / edit only — no delete, no destructive operations
  • No skill execution — behavior is extended via AGENTS.md, not runnable code

This is not a 100% guarantee. Prompt injection can still exfiltrate data from documents the agent has access to. Always monitor what the agent sees and how it behaves.

Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
Recap · Anatomy14 / 15

The Full Anatomy

B
The Brain LLM — thinks, observes and acts via structured outputs
E
The Eyes Virtual filesystem — the agent's source of documents
L
The Limbs Tools — LlamaParse, LlamaExtract, LlamaClassify
M
Ears & Mouth Chat interface — listens to input, replies asynchronously
Voxel51 Meetup Stuttgart · 21st April 2026
Intro
Brain
Loop
World
Recap
Recap · Takeaways15 / 15

Key Takeaways

  • Structured outputs are what make LLM behaviour reliable inside an agent
  • Event-driven loops turn a model into an autonomous, self-reflecting task executor
  • Virtual filesystems enable safe, bounded document access without real-FS risk
  • Async chat interfaces unlock long-running document workflows without blocking the user
C
Clelia Astra Bertelli  ·  clelia@runllama.ai  ·  LinkedIn  ·  X  ·  Personal Website
Thank you!Questions?
Intro
Brain
Loop
World
Recap